Main Contributing Factors
To determine the contributing factors of collisions, I will use the
contributing_factor_vehicle_x columns from the
crashes data set. The data set will have to be
transformed to a long format for this.
contributing_factors_by_crash <- crashes |>
select(collision_id, crash_date, crash_time, borough, c(contributing_factor_vehicle_1:contributing_factor_vehicle_5))
contributing_factors <- contributing_factors_by_crash |>
pivot_longer(col = c(contributing_factor_vehicle_1:contributing_factor_vehicle_5),
names_to = "vehicle",
values_to = "factor") |>
mutate(factor = case_when(str_detect(factor, "Cell Phone") ~ "Cell Phone",
str_detect(factor, "Drugs") ~ "Drugs",
str_detect(factor, "Ill") ~ "Illness",
str_detect(factor, "Uninvolved Vehicle") ~ "Reaction to Uninvolved Vehicle",
TRUE~factor)) |>
filter(!is.na(factor), !factor %in% c("Unspecified", "1", "80"))
contributing_factors |>
count(factor) |>
arrange(desc(n)) |>
mutate(prop = n / sum(n)) |>
head(10) |>
ggplot(aes(x = prop, y = reorder(factor, n))) +
geom_bar(stat="identity", fill = "steelblue") +
scale_x_continuous(label = scales::percent) +
labs(title = "Top 10 Contributing Factors", x = "Percentage of Crashes", y = "Factor")
The main contributing factor in motor vehicle collisions is driver inattention/distraction, which accounts for about 31% of the causes for collisions within this data set. Other contributing factors include failure to yield to right-of-way, following too closely, and other driver action related factors, as well as driver fatigue.
contributing_factors |>
select(-vehicle) |>
unique() |> # remove replicated crashes (i.e. multiple drivers in one collision with same causal factor)
count(factor) |>
arrange(desc(n)) |>
head(1)
## # A tibble: 1 × 2
## factor n
## <chr> <int>
## 1 Driver Inattention/Distraction 416287
416,287 collisions in this data set were caused by one or more drivers being inattentive or distracted.
What is the main contributing factor for crashes occurring at each hour of the day?
Since driver inattention seems to be the most overwhelming factor in this data set, I will filter that out as the top factor and will check to see the second most common factor for each hour.
contributing_factors |>
filter(!str_detect(factor, "Inattention/")) |>
mutate(hour = hour(crash_time)) |>
group_by(hour) |>
count(factor) |>
filter(n == max(n)) |>
knitr::kable(col.names = c("Hour", "Factor", "Number of Collisions"))
| Hour | Factor | Number of Collisions |
|---|---|---|
| 0 | Following Too Closely | 4003 |
| 1 | Other Vehicular | 1782 |
| 2 | Alcohol Involvement | 1590 |
| 3 | Alcohol Involvement | 1609 |
| 4 | Alcohol Involvement | 1826 |
| 5 | Following Too Closely | 1425 |
| 6 | Following Too Closely | 3006 |
| 7 | Following Too Closely | 4448 |
| 8 | Failure to Yield Right-of-Way | 8204 |
| 9 | Failure to Yield Right-of-Way | 7141 |
| 10 | Failure to Yield Right-of-Way | 6426 |
| 11 | Failure to Yield Right-of-Way | 6571 |
| 12 | Failure to Yield Right-of-Way | 7383 |
| 13 | Failure to Yield Right-of-Way | 7991 |
| 14 | Failure to Yield Right-of-Way | 9244 |
| 15 | Following Too Closely | 9122 |
| 16 | Failure to Yield Right-of-Way | 10321 |
| 17 | Failure to Yield Right-of-Way | 10737 |
| 18 | Failure to Yield Right-of-Way | 9454 |
| 19 | Failure to Yield Right-of-Way | 7712 |
| 20 | Failure to Yield Right-of-Way | 6259 |
| 21 | Failure to Yield Right-of-Way | 5069 |
| 22 | Failure to Yield Right-of-Way | 4183 |
| 23 | Failure to Yield Right-of-Way | 3117 |
Between the hours of 2-4 AM, driver inebriation from alcohol involvement is the main factor contributing to collisions. For all other hours, following too closely or failure to yield to right of way is the main contributing factor.
How many collisions are caused by alcohol or drug involvement?
Overall
contributing_factors |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
select(-vehicle) |>
unique() |> # remove duplicate crashes
nrow()
## [1] 23486
There are 23,486 incidents of alcohol or drug involvement in car accidents recorded in this data set.
How many of these accidents resulted in fatalities or serious injury?
intox_fatalities_injuries <- fatalities_and_injuries |>
pivot_longer(col = c(contributing_factor_vehicle_1:contributing_factor_vehicle_5),
names_to = "vehicle",
values_to = "factor") |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
select(-vehicle) |>
unique()
intox_fatalities_injuries |>
nrow()
## [1] 7486
intox_fatalities_injuries |>
mutate(injury = ifelse(number_persons_injured > 0, 1, 0),
fatality = ifelse(number_persons_killed > 0, 1, 0)) |>
summarize(total_injury_inducing = sum(injury),
total_fatality_inducing = sum(fatality))
## # A tibble: 1 × 2
## total_injury_inducing total_fatality_inducing
## <dbl> <dbl>
## 1 7439 117
There are 7,486 fatality/injury inducing accidents caused by alcohol/drug involvement of one or more drivers, 7,439 of which caused severe injuries and 117 of which resulting in death.
Per Year (Table)
contributing_factors |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
count(year = year(crash_date)) |>
knitr::kable(col.names = c("Year", "Number of Collisions Involving Intoxicated Drivers"))
| Year | Number of Collisions Involving Intoxicated Drivers |
|---|---|
| 2012 | 856 |
| 2013 | 1760 |
| 2014 | 2261 |
| 2015 | 2201 |
| 2016 | 3033 |
| 2017 | 2848 |
| 2018 | 2577 |
| 2019 | 2389 |
| 2020 | 1741 |
| 2021 | 1924 |
| 2022 | 1953 |
| 2023 | 568 |
Per Year (Graph)
contributing_factors |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
select(-vehicle) |>
unique() |>
count(year = as.factor(year(crash_date))) |>
ggplot(aes(x = year, y = n)) +
geom_bar(stat="identity", fill = "steelblue4") +
labs(title = "Number of Collisions Involving Intoxicated* Drivers per Year", x = "Year", y = "Number of Accidents", caption = "*Drugs or Alcohol") +
geom_text(aes(label = n), vjust = -0.5, size=3)
The most recorded collisions from alcohol/drug involvement was in 2016, with 2,925 accidents occurring in relation to intoxicated drivers. Since Vision Zero was only implemented in 2016, the number of collisions from 2012-2015 may not be complete so these numbers may not accurately reflect the true number of crashes caused by alcohol/drug involvement. Likewise, the data for 2023 is not complete, as the data was accessed about five months into the year and only accounts for collisions within those five months. Since the implementation of Vision Zero in 2016, it seems that collisions resulting from drug or alcohol intoxication has decreased till 2020 and then started to increase again slightly. The major dip from 2019 to 2020 may have been caused by stay-at-home policies due to Covid-19 resulting in less drivers overall on roads during this time.
Per Hour of Day
contributing_factors |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
select(-vehicle) |>
unique() |>
count(hour = as.factor(hour(crash_time))) |>
ggplot(aes(x = hour, y = n)) +
geom_bar(stat="identity", fill = "steelblue4") +
labs(title = "Number of Collisions Involving Intoxicated* Drivers per Hour of Day", x = "Hour of Day", y = "Number of Accidents", caption = "*Drugs or Alcohol") +
geom_text(aes(label = n), vjust = -0.5, size=2.5)
The most collisions occur from driver intoxication in the early and later hours of the day (from midnight to 5 AM and rising steadily from 4 to 11 PM).
Across Boroughs
contributing_factors |>
filter(str_detect(tolower(factor), "drugs|alcohol")) |>
select(-vehicle) |>
filter(!is.na(borough)) |>
unique() |>
count(borough) |>
ggplot(aes(x = borough, y = n)) +
geom_bar(stat="identity", fill = "steelblue4") +
labs(title = "Number of Collisions Involving Intoxicated* Drivers by Borough", x = "Borough", y = "Number of Accidents", caption = "*Drugs or Alcohol") +
geom_text(aes(label = n), vjust = -0.5, size=3)
The most collisions resulting from alcohol/drug usage occur in Brooklyn and Queens.